Skip to content

feat: Add SVE kernels for TopKV#1256

Open
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels
Open

feat: Add SVE kernels for TopKV#1256
morgolock wants to merge 1 commit intomainfrom
pr/topkv_sve_kernels

Conversation

@morgolock
Copy link
Contributor

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799

filelist.json Outdated
"src/cpu/kernels/topkv/generic/neon/qasymm8_signed.cpp"
]
}
"files": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the indentation change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next patch

@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 1deb127 to 7944860 Compare February 6, 2026 16:12
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 7944860 to 21ba8e9 Compare March 2, 2026 12:56
inline uint32_t count_gt_block<float16_t>(const float16_t *ptr, float16_t thr, uint32_t block_elems)
{
const svbool_t pg = svwhilelt_b16(static_cast<uint64_t>(0), static_cast<uint64_t>(block_elems));
const svfloat16_t v = svld1_f16(pg, ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have two questions to ask:

  • Why do we convert to Fp32 in the Neon(TM) implementation?
  • We should incorporate epsilon in both implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in next patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we convert to Fp32 in the Neon(TM) implementation

We could do the computation in f16 but the epsilon must be f32 to align with ref. I can address this in a different patch. It's not a serious problem, if anything there is room to optimize the neon kernel even further.

@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 21ba8e9 to 36f1ac1 Compare March 9, 2026 14:28
@morgolock morgolock requested a review from gunes-arm March 9, 2026 14:29
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from 36f1ac1 to c6b4071 Compare March 10, 2026 11:49
Resolves MLCE-1719

Change-Id: I7a0c7bd1154b9cb7f35c7fd1c3b8ad54698f8799
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
@morgolock morgolock force-pushed the pr/topkv_sve_kernels branch from c6b4071 to 98ab9ec Compare March 10, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants